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Abstract 

Answer Set Programming (ASP) is a well-established paradigm of declarative program- 
ming in close relationship with other declarative formalisms such as SAT Modulo Theories, 
Constraint Handling Rules, FO(.), PDDL and many others. Since its first informal edi- 
tions, ASP systems have been compared in the now well-established ASP Competition. 
The Third (Open) ASP Competition, as the sequel to the ASP Competitions Series held 
at the University of Potsdam in Germany (2006-2007) and at the University of Leuven in 
Belgium in 2009, took place at the University of Calabria (Italy) in the first half of 2011. 
Participants competed on a pre-selected collection of benchmark problems, taken from a 
variety of domains as well as real world applications. 

The Competition ran on two tracks: the Model and Solve (M&S) Track, based on an 
open problem encoding, and open language, and open to any kind of system based on a 
declarative specification paradigm; and the System Track, run on the basis of fixed, public 
problem encodings, written in a standard ASP language. This paper discusses the format 
of the Competition and the rationale behind it, then reports the results for both tracks. 
Comparison with the second ASP competition and state-of-the-art solutions for some of 
the benchmark domains is eventually discussed. 

KEYWORDS: Answer Set Programming, Logic Programming, Declarative languages, Ar- 
tificial Intelligence Competitions 



1 Introduction 

Answer Set Programming (ASP) is a declarative approach to computer program- 
ming stemming roots in the area of nonmonotonic reasoning and logic programming 
(jGelfond and Lifschitz 1991llNiemela 1999||Marek and Truszczynski 1999[ ). The main 



advantage of ASF^J is its high declarative nature combined with a relatively high ex- 
pressive power (jDantsin et al. 2001|) . After some pioneering work (jBell et al. 19941 
ISubrahmanian et al. 1995)) . nowadays there are a number of systems that support 
ASP and its variants (|Anger et al. 2005| IDal Palu et al. 20091 IGebser et al. 20071 



IJanhunen and Niemela 20041 ILefevre and Nicolas 2009blfLeone et al. 20061 ILierler and Maratea 20041 
ILin and "Zh ao 2004] ISimons et al. 2002]) . The availability of some efficient systems 

1 For introductory material on ASP, the reader might refer to (Baral 2003 Eitcr et al. 2009). 
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make ASP a powerful tool for developing advanced applications in several fields, 
ranging from Artificial Intelligence (|Balduccini et al. 2001llBaral and Gelfond 20001 
Baral and Uyan 2001[IFriedrich and Ivanchenk o 2008; Franconi et al. 20011[No gueira et al. 2001 



[Wasp 2003[ ) to Information Integration (jLeone et al. 20 05; Mari leo and Bertossi 2010]) . 
Knowledge Management (|Baral 20031 [Bardady m 1996{ IGrasso et al. 20091) . Bioin- 
formatics (Palopoli et al. 2005 IDovier 20111 IGebser et al. 20iTj) . and has stimu- 



lated some interest also in industry (Grasso ct al. 2010l lRicca et al. 2010|) . 

ASP systems are evaluated in the now well-established ASP Competitions, that 
started with two informal trials at ASP Dagstuhl meetings in 2002 and 2005. The 
present competition, held at the University of Calabria (Italy), is the third official 
edition, since the rules of the contest were formalized and implemented in the first 
two "official" ASP Competitions (jGebser et al. 20071 iDenecker et al. 2009]) . Besides 
comparing ASP systems with each other, one of the goals of the competition is to 
benchmark similar systems and declarative paradigms close in spirit to ASP. To this 
end, the Third ASP Competition featured two tracks: the Model and Solve Com- 
petition Track (M&S Track from now on), based on an open problem encoding, 
open language basis, and open to any system based on a declarative specification 
paradigm; and the System Competition Track, based on a fixed problem encod- 
ings, written in a standard ASP language. The M&S Competition Track essentially 
follows the direction of the previous ASP Competition; the System Competition 
Track (System Track from now on) was conceived in order to compare participant 
systems on the basis of fixed input language and fixed conditions. 

A preliminary work reporting results of the System Track appeared in Calime ri et al. 201 lbl 
this paper extends that work in the following respects: 

• detailed results of the System Track, which now include non-participant sys- 
tems such as parallel solvers and some latecomers. 

• description of the problem categories, including those appearing in the M&S 
Track only; 

• discussion of the rationale and the rules of the M&S Track, presentation of 
competitors and results of the Track; 

• a number of comparisons; namely, we show: 

— how the winner of the Second ASP Competition performed on this edi- 
tion's benchmarks; 

— whenever applicable, how participants to this edition performed on the 
former Competition benchmarks; 

— for systems to which this is applicable, whether and how performance 
changed when switching from the System Track settings to the more 
liberal settings of the M&S competition; 

— for a selection of benchmarks, how the participants performed against 
some known state-of-the-art solutions; these ranged from specialized al- 
gorithms/systems to tailored ad- hoc solutions based on constraint pro- 
gramming and/or SAT. 



The remainder of the paper is structured as follows: in Section [2] we discuss the 



The Third Open Answer Set Programming Competition 



3 



Competition rationale, the subsequent format and regulations for both Tracks, and 
we briefly overview the standard language adopted in the System Track; Section [3] 
illustrates the classes of declarative languages participating in the M&S Track and 
the classes of evaluation techniques adopted (particularly focusing on the System 
Track), and presents the participants in the Competition; in Section [4] we illustrate 
the scoring criteria, the benchmark suite and other competition settings; Section [5] 
reports and discusses the actual results of the Competition; in Section [6] we report 
details about comparisons of participants with a number of yardstick problem solu- 
tions and/or systems; conclusions are eventually drawn in Section [7] An electronic 
appendix details the above whenever appropriate. 

2 Competition Format 

In this Section we describe the Competition format for the System and M&S Track, 
thoroughly discussing motivations and purposes that led to the choices made: the 
two tracks differ in regulations and design principles. It must be observed that the 
System Track resembles competitions of neighboring communities in spirit, and is 
indeed played on a fixed input language, with fixed input problem specifications. 
However, both tracks introduce specific aspects related to the ASP philosophy. 
As a main difference, note that competitions close to the ASP community (e.g. 
SAT, CASC, IPC) are run on a set of couples (£, S), for i an input instance and S 
a participant solver. Instead, the ASP competition has problem specifications as a 
further variable. ASP Competitions can be seen as played on a set of triples (i,p, S): 
here, i is an input instance, p a problem specification, and S a solver. Depending 
on track regulations, i, p and S are subject to specific constraints. 

System Track Format. The regulations of the System Track were conceived 
taking into account two main guidelines. As a first remark, it must be observed 
that ASP is still missing a standard high-level input language, in contrast with 
other similar declarative paradigms@ It was thus important to play on the grounds 
of a common language, despite restriction to commonly acknowledged constructs 
only. As a second guideline, it has been taken into account that the outcome of the 
System Track should give a fairly objective measure of what one can expect when 
switching from one system to another, while keeping all other conditions fixed, 
such as the problem encoding and solver settings. In accordance with the above, 
the System Track was held on the basis of the following rules. 

1. The Track was open to systems able to parse input written in a fixed language 
format, called ASP-Core; 

2. For each benchmark problem, the organizers chose a fixed ASP-Core specifi- 
cation: each system had to use this specification compulsorily for solving the 
problem at hand; 

2 These range from the Satisfiability Modulo Theo ries SMT-LIB format l|smt-lib-web 2011JI . the 
Planning Domain Definition Language (PDDL) (Gcrcvini and Long 200 5]l, the TPTP for mat 
used in the CASC Automated Theorem Provi ng System Competitions I ICADE-ATP 20111 . to 
the Constraint Handling Rules (CHR) family ijCHR 2004f l. 
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3. Syntactic special-purpose solving techniques, e.g. recognizing a problem from 
file names, predicates name etc., were forbidden. 

The detailed rules and the definition of "syntactic technique" are reported in 
|Appendix B 

The language ASP-Core. ASP-Core is a rule-based language its syntax stemming 
from plain Datalog and Prolog: it is a conservative extension to the nonground case 
of the Core language adopted in the First ASP Competition; it complies with the 
core language draft specified at LPNMR 2004 (|ASP-Draft 2004j) . and refers to the 
language specified in the seminal paper (G elfond and Lifschitz 199Tj) . Its reduced 
set of constructs is nowadays common for ASP parsers and can be supported by 
any existing system with very minor implementation effortd ASP-Core features 
disjunction in the rule heads, both strong and negation-as-failure (NAF) negation 
in rule bodies, as well as nonground rules. A detailed overview of ASP-Core is 
reported in |Appcndix A| the full ASP-Core language specification can be found in 
(jCalimeri et al. 2011a|) . 

Model and Solve Track Format. The regulations of the M&S Track take into 
account the experience coming from the previous ASP Competitions. As driving 
principles in the design of the M&S Track regulations we can list: encouraging the 
development of new expressive declarative constructs and/or new modeling para- 
digms; fostering the exchange of ideas between communities in close relationships 
with ASP; and, stimulating the development of new ad-hoc solving methods, and 
refined problem specifications and heuristics, on a per benchmark domain basis. In 
the light of the above, the M&S Track was held under the following rules: 

f. the competition organizers made a set of problem specifications public, to- 
gether with a set of test instances, these latter expressed in a common instance 
input format; 

2. for each problem, teams were allowed to submit a specific solution bundle, 
based on a solver (or a combination of solvers) of choice, and a problem 
encoding; 

3. any submitted solution bundle was required to be mainly based on a declar- 
ative specification language. 

3 Participants 

In this section we present the participants in the competition, categorized by the 
adopted modeling paradigms and their evaluation techniques. 

3 During competition activities, we also developed a larger language proposal, called ASP-RfC 
(Request for Comments), including aggregates and other widely used, but not yet standardized, 
language features. 
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3.1 System Track 

The participants in the System Track were only ASP-Based systems. The tradi- 
tional approach to ASP program evaluation follows an instance processing work-flow 
composed of a grounding module, generating a propositional theory, coupled with 
a subsequent propositional solver module. There have been other attempts deviat- 
ing from this customary approach (jDal Palu et al. 2009ULefevre and Nicolas 2 009a 
ILefevre and Nicolas 2009b|) ; nonetheless all the participants adopted the canonical 
"ground & solve" strategy. In order to deal with nonvariable-free programs, all 
solvers eventually relied on the grounder Gringo (IGebser et al. 2 007). In detail, the 
System Track had eleven official participants and five noncompeting systems. These 
can be classified according to the employed evaluation strategy as follows: 

Native ASP: featuring custom propositional search techniques that are based on 
backtracking algorithms tailored for dealing with logic programs. To this class be- 
long: clasp (jGebser et al. 2009p . claspD (jDrescher et al. 2008) . claspfolio (|Gebser et al. 201 ljl . 
Aclasp ( |Aclasp 20iTj ), Smodels (jSimons et al. 2002j) . idp (jWittocx et al. 2008|) . 
and the non-competing clasp-mt (Ellgu th et al. 2 009). clasp features techniques 
from the area of boolean constraint solving, and its primary algorithm relies on 
conflict-driven nogood learning. claspD is an extension of clasp that is able to solve 
unrestricted disjunctive logic programs, while claspfolio exploits machine-learning 
techniques in order to choose the best-suited configuration of clasp to process the 
given input program; Aclasp (non-participant system) is a variant of clasp employ- 
ing a different restart strategy; clasp-mt is a multi-threaded version of clasp. IDP is 
a finite model generator for extended first-order logic theories. Finally, Smodels, 
one of the first robust ASP systems that have been made available to the commu- 
nity, was included in the competition for comparison purposes, given its historical 
importance. 

SAT-Based: employing translation techniques -e.g., completion ( |Fages 19"94| ), loop 
formulas (|Lee and Lifschitz 2003|lLin and Zhao 2004p . nonclausal constraints (jLierler 2008^ — 
to enforce correspondence between answer sets and satisfying assignments of SAT 
formulas so that state-of-the-art SAT solvers can be used for computing answer 
sets. To this class belong: CMODELS (jLierler and Maratea 2004)) . SUP (|Lierler 2008P 
and three variants of lp2sat (|Janhunen 2006|) : (lp2gminisat, lp2lminisat and 
lp2minisat). In detail, CMODELS can handle disjunctive logic programs and exploits 
a SAT solver as a search engine for enumerating models, and also verifying model 
minimality whenever needed; SUP makes use of nonclausal constraints, and can be 
seen as a combination of the computational ideas behind CMODELS and Smodels; 
the LP2SAT family of solvers, where the trailing g and I account for the presence of 
variants of the basic strategy, employed MiniSat (|Een and Sorensson 2003P . 

Difference Logic-based: exploiting a translation (Jan hunen et al. 2009[) from ASP 

propositional programs to Difference Logic (DL) theories (Nicuwcn huis and Oliveras 2 005 ) 

to perform the computation of answer sets via Satisfiability Modulo Theories (jNieuwenhuis et al. 2006P 
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solvers. To this class belongs lp2diffz3 and its three non-competing variants, 
namely: lp2diffgz3, lp2difflz3, and lp2difflgz3. The lp2diff solver fam- 
ily (jJanhunen et al. 2009[) translates ground ASP programs into the QF_ IDL dialect 
(difference logic over integers) of the SMT library ([smt-lib-web 201 T|) ; the trailing g, 
I and Ig letters account for different variants of the basic translation technique. The 
LP2DIFF family had Z3 (|de Moura and Bj0rner 2008[), as underlying SMT solver. 



3.2 M&S Track 

The M&S Competition Track was held on an open problem encoding, open lan- 
guage basis. Thus participants adopted several different declarative paradigms, 
which roughly belong to the following families of languages: ASP-based, adopt- 
ing ASP (Gel fond and Lifschitz 1991]) (and variants) as modeling language; FO(.)- 
based, employing FO(ID) (jDenecker and Ternovska 2008|) : CLP -based, using logic 
programming as declarative middle-ware language for reasoning on constraint sat- 
isfaction problems (Ja ffar and Lassez 1987|) : and, Planning-based, adopting PDDL 
(Planning Domain Definition Language) as modeling language (|PDDL 3.1 2 008). 
respectively. In detail, six teams participated to the M&S Track: 



Potassco: The Potassco team from the University of Potsdam, Germany f Gebser et al. 2007)) 
submitted a heterogenous ASP-based solution bundle. Depending on the bench- 
mark problem, Potassco employed Gringo (Gebs eFet al. 2007)) coupled with either 
clasp (Gc bser et al. 2009)) or claspD (Drcschcr et al. 2008), and Clingcon, which is 
an answer set solver for constraint logic programs, built upon the Clingo system 
and the CSP solver Gecode (GECODE 2011), embedding and extending Gringo for 
grounding. 

Aclasp: The team exploited the same ASP-based solutions provided by the Potassco 
team, and participated only in a number of selected problem domains. The solver 
of choice was Aclasp ([Aclasp 2011), a modified version of clasp that features a 



different restart-strategy, depending on the average decision-level on which conflicts 
occurred. The grounder of choice was Gringo. 



idp: The IDP (|Wittocx et al. 2008|) team, from the Knowledge Representation and 
Reasoning (KRR) research group of K.U.Leuven, Belgium, proposed FO(.)-based 
solutions. In particular, problem solutions were formulated in the FO{.) input lan- 
guage, and the problem instances were solved by MiniSatID (jMarien et al. 2008|) 
on top of the the grounder Gidl (jWittocx et al. 20 08) . A preprocessing script was 
used to rewrite ASP instances into FO(.) structures. 

EZCSP: EZCSP is an Eastman Kodak Company and University of Kentucky joint 
team. The team is interested in evaluating and comparing ASP and hybrid lan- 
guages on challenging industrial-sized domains. EZCSP (|Balduccini 2009c)) is also 
the name of both the CLP-based modeling language, featuring a lightweight inte- 
gration between ASP and Constraint Programming (CP), and the solver employed 
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by this team. The EZCSP system supports the free combination of different ASP 
and CP solvers which can be selected as sub-solvers, according to the features of 
the target domain. In particular, the team exploited the following ASP solvers de- 
pending on the benchmark at hand: clasp, iClingo and ASPM (jBalduccini 2009a[) . 
Moreover, in cases where CP constraints were used, the team selected B-Prolog as 
solver. 

BPSolver: This team adopted a CLP-based modeling language and exploited the 
B-Prolog system (jZhou 201 1[) for implementing solutions. BPSolver employed either 
pure Prolog, tabling techniques (jChen and Warren 1996[) . or CLP(FD) (Hcntenryck 1989) 
depending on the problem at hand. In particular, apart from a few problems that 
required only plain Prolog, all the provided solutions were based on either CLP(FD) 
or tabling. 

Fast Downward: This is an international multiple research institutions joint team 
that proposed some planning-based solutions. Benchmark domains were statically 
modeled as planning domains, and problem instances were automatically trans- 
lated from ASP to PDDL. The submitted solutions were based on Fast Downward 
(jHelmert 20 06). a planning system developed in the automated planning commu- 
nity. The team restricted its participation to benchmarks that could be easily seen 
as planning problems and exploited a number of different configurations/heuristics 
of Fast Downward. 



4 Competition Settings 

We now briefly describe the scoring methodology of choice, the selected benchmarks 
and other practical settings in what the competition was run. A detailed description 
of general settings, the scoring criteria and the selected benchmark suite can be 
respectively found in Appendices B, C and D. 

Scoring System. The scoring framework is a refinement of the one adopted in the 
first and second ASP Competitions. In these former editions, scoring rules were 
mainly based on a weighted sum of the number of instances solved within a given 
time-bound; in this edition, the scoring framework has been extended by awarding 
additional points to systems performing well in terms of evaluation time. For search 
and query problems, each system on benchmark problem P was awarded the score 
S{P) = S so i ve (P) + S time (P). S soive and S time could range from to 50 each: 
while Ssoive is linearly dependent on the number of instances solved in the allotted 
time, Same contains a logarithmic dependence on participants' running times, thus 
making less significant time differences in the same order of magnitude. As for 
optimization problems, the S so i ve quota was replaced with a scoring formula taking 
into account also the solution quality, in particular, the closer to the optimal cost, 
the better exponentially. 
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Benchmark suite. There were a total of 35 selected benchmark domains, mainly 
classified according to the computational complexity of the related problem, in 
Polynomial, NP, and Beyond- NP ones, where this latter category was split into 
£f and Optimization. The benchmark suite included planning domains, temporal 
and spatial scheduling problems, combinatory puzzles, a few graph problems, and 
a number of applicative domains taken from the database, information extraction 
and molecular biology field. According to their type, problems were also classified 
into Search, Query and Optimization ones. 

Software and Hardware Settings. The Competition took place on servers featuring 
a 4-core Intel Xeon CPU X3430 running at 2.4 Ghz, with 4 GiB of physical RAM. 
All the systems where benchmarked with just one out of four processors enabled, 
with the exception of the parallel solver clasp-mt, and were allowed to use up to 3 
GiB of user memory. The allowed execution time for each problem's instance was 
set at 600 seconds. 



5 Results and Discussion 

The final competition results are reported in Figures [1] and [U for System and M&S 
Track, respectively. The detailed results for each considered benchmark problem, 
and cactus plots detailing the number of instances solved and the corresponding 
time, on a per participant basis, are reported in | Appendix F| (note that timed 
out instances are not drawn). Full competition figures, detailed on a per instance 
basis, together with executable packages and declarative specifications submitted 
by participants, are available on the competition web site (jCalimeri et al. 201~0|) . 



5.1 System Track Results 

Polynomial Problems. Grounding modules are mainly assessed while dealing with 
problems from this category, with the notable exception of two problems, for which, 
although known to be solvable in polynomial time, we chose their natural declarative 
encoding, making use of disjunction. In the case of these last two problems, the 
"combined" ability of grounder and propositional solver modules was tested. The 
aim was to measure whether, and to what extent, a participant system could be 
able to converge on a polynomial evaluation strategy when fed with such a natural 
encoding. All the participant systems employed Gringo (v. 3. 0.3) as the grounding 
module: however, we noticed some systematic performance differences, owing to 
the different command line options fed to Gringo by participants. The winner of the 



category is clasp, with 213 points, as shown in Figure [TJ Interestingly, Figure 5(a 
of | Appendix F| shows a sharp difference between a group of easy and hard instances 
notably, these latter enforced a bigger memory footprint when evaluated; indeed, 
it is worth mentioning that the main cause of failure in this category was out of 
memory, rather than time-out. Instances were in fact relatively large, usually. 
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NP Problems. The results of this category show how claspf olio (609 points) slightly 
outperformed clasp and IDP (597 points), these latter having a slightly better time 
score (227 versus 224 of claspf olio). 

Beyond-NP Problems. Only the two systems claspD and CMODELS were able to 
deal with the two problems in this category, with claspD solving and gaining points 
on both problems, and CMODELS behaving well on MlNlMALDlAGNOSlS only. 

Overall Results. Figure [T] shows claspD as the overall winner, with 861 points: 560 
points were awarded for the instance score, corresponding to a total of 112 instances 
solved out of 200. claspf olio and clasp follow with a respective grandtotal of 818 
and 810. It is worth noting that claspD is the only system, together with CMODELS, 
capable of dealing with the two Beyond-NP problems included in the benchmark 
suite, this giving to claspD a clear advantage in terms of score. 

Non-competing Systems. After the official competition run, we additionally ran five 
non-competing systems, including: the parallel system clasp-mt, and a number of 
solvers which did not meet the final official deadline. We included here also those 
systems in order to give a wider picture of the state of the art in ASP solving. 

Noncompeting systems are reported in italics in Figure [TJ and their behavior is 
plotted in Figures 5-8 of |Appcndix F| with the winner of the System Track used as 
a yardstick. Note that noncompeting executables were mostly variants of the exe- 
cutables presented by the Potassco and the Aalto teams. None of them performed 
clearly better than the "official" participating versions. The best sequential non- 
competing system was Aclasp, with a score of 780 points, which would have reached 
the fifth absolute position in the final classification, see Figure Q] 

A special mention goes to the parallel system clasp-mt, the only system that 
ran on a machine with four CPUs enabled, clasp-mt is a comparatively young 
system and (although it was disqualified from some domain^) it is clearly the best 
performer in NP totalizing 629 points in this category corresponding to 29 points 
more than the best sequential system (claspf olio). 

This result confirms the importance of investing in parallel solving techniques for 
exploiting the nowadays-diffused parallel hardware. 

5.2 M&S Track Results 

Polynomial Problems. The winner in the category (see Figure [2]) is the Potassco 
team. It is worth mentioning that the runner-up BPsolver, which for this category 
presented solutions based on predicate tabling, was the absolute winner in three out 
of the seven problem domains. This suggests that top-down evaluation techniques 
might pay off, especially for polynomial problems. 



4 The overall score for a problem P is set to zero if the system produces an incorrect answer for 
some instance P. See | Appendix C] for more details. 
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NP Problems. The category shows the Potassco team as winner with 1463 points, 
closely followed by EZCSP (1406 points), which was notably the best performing 
team on ReverseFolding and Packing. Also, BPsolver was by far the fastest 
system in six domains out of nineteen, although its performance was fluctuating 
(e.g. in Solitaire and GraphColouring) and it was disqualified in a couple of 
domains@ The very good performance of idp on GraphColouring is also worth 
mentioning. 

Beyond-NP Problems. Only two teams submitted solutions for the two problems 
in this category, with the Potassco team being the clear winner in both domains. 

Optimization Problems. In this category the two clasp-based solution bundles (Po- 
tassco and Aclasp) outperformed the rest of participants, with idp being the first 
nonclasp-based system in the category. As in the NP category, BPsolver was the 
best team in a couple of domains. 

Overall Results. Figure [2] shows the Potassco solution bundle as the clear winner of 
the M&S Track (see Figure |F2|) . The results detailed per benchmark (see Appendix 
F, Figure lF~4l) show that all the teams were best performers in one ore more domains 
with very encouraging results. 

A special remark must be made concerning the fastdownward team, coming from 
the planning community. It competed in only a few number of domains, mostly 
corresponding to planning problems, totalizing 433 points. Given the small num- 
ber of problems which fastdownward participated in, a clear comparison cannot be 
drawn: however, it can be noted that fastdownward performed quite well on Hy- 
DRAULIcLeaking and HydraulicPlanning. In other domains, the relatively low 
performance can be explained considering that some of the problems where spec- 
ified more in the form of knowledge representation problems, with some common 
restrictions specific to ASP. 

The solutions proposed by all participants were very heterogenous, ranging from 
purely declarative to the usage of Prolog in a nearly procedural style. Among the 
lessons learned, it is worth observing the fact that purely declarative solutions very 
often paid off in terms of efficiency, and outperformed comparatively more tweaked 
approaches to problem solving. 



6 Further Analysis 

In this section a number of additional analyses are reported, with the aim of giv- 
ing a clearer and more complete picture of the state of the art in ASP solving and 
declarative programming. In particular: we compared the winning solution bundled 



5 For the sake of scientific comparison, Figure IF 41 reports also scores obtained by bpsolver on 
a late submission for HanoiTowers, fixing the faulty solution submitted within the deadline. 
Grand totals include this latter score. 

6 As "solution bundle" we mean here the combination of ad-hoc tuned solver binaries together 
with ad-hoc encodings, as they were submitted in the second ASP competion. 
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Fig. 2. M&S Track - Results by Categories 

submitted to the former ASP Competition with the updated ones submitted also to 
the Third ASP Competition, so that two years of advances in the state of the art are 
outlined; we measured the distance in performance among ASP-based solutions and 
some specialized ad-hoc solutions available in the literature, for some specific bench- 
mark problems considered in the current competition; and, eventually, we assessed 
the impact of fine-tuning of systems/solutions by comparing specialized executa- 
bles and problem encodings submitted to the M&S Track with a good-performing 
default-setting ASP system of the System Track. The outcomes are summarized 
in Figure [H A grey strip highlights the lines corresponding to "yardstick" systems 
which competitors have been compared to. Total scores are computed according 
to the rules of this Competition; Solved-score, which roughly corresponds to the 
score computed according to past Competition rules, is obtained by subtracting 
the score corresponding to the time quota from the Total score introduced in this 
competition (see | Appendix Cl for more insights). The results are discussed in detail 
in the following. 
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The State-of-the-art after Two-years of Improvements. In order to assess 
possible improvements over former participants in the Second ASP Competition, 
we selected some significant problems appearing both in the Third and Second 
edition of the Competition with same specification. This set of problems counts a 
polynomial problem (Grammar-based Information Extraction), a NP prob- 
lem (GraphColoring), and two optimization problems (FastFoodOptimiza- 
tion and MaximalClique). On these benchmarks, we ran the solution bundles 
submitted by the winners of the 2nd ASP Competition (the Potassco team) along- 
side all the solution bundles of the current participants to the M&S Track. The 
instance families used for this test were composed of both the instances used in the 
2nd ASP Competition and in the current edition. 

The state of the art in the last two years has been clearly pushed forward, as 
witnessed by the results reported in the four leftmost sections of Figure [3] (cor- 
responding to the above-mentioned problems). Indeed, the new solution bundles 
based on clasp (indicated by Potassco in Figure [3|) outperformed in all considered 
benchmarks the ones (indicated by clasp '09 in Figure [3J submitted to the Second 
ASP competition. Note also that, other current solution bundles (i.e., bpsolver, 
Aclasp, IDP) were often able to outperform clasp '09, and are generally comparable 
to Potassco even beating it on the Grammar-based Information Extraction 
and GraphColoring problems. 

Participants vs Ad-hoc Solutions. Participants in the Competition were based 
on declarative formalisms, and were essentially conceived as "general-purpose" 
solvers. For a selection of benchmark domains, we compared participants in the 
M&S Track with specialized ad-hoc solutions, not necessarily based on declarative 
specifications, with the aim of figuring out what a user might expect to pay in order 
to enjoy the flexibility of a declarative system. 

Maximal Clique. MaximalClique is a graph problem with a longstanding history 
of research towards efficient evaluation algorithms. It was one of the problems inves- 
tigated in the early Second DIMACS Implementation Challenge (j Johnson and Trick 1 996). 
and research continued on the topic later on ([Gibbons et al. 19961 IBomze et al. 19991 
IGutin 2004p . In the Competition, the problem was specified as finding the maxi- 
mum cardinality clique on a given graph (jCalimeri et al. 2010[) and most of the 
instances were taken from BHOSLIB (|Xu 2004|) . 

We compared participants in the M&S Track with Cliquer (Niskanen 2003) . Cli- 
quer is an up-to-date implementation of an exact branch-and-bound algorithm, 
which is expected to perform well while dealing with several classes of graphs in- 
cluding sparse, random graphs and graphs with certain combinatorial properties 
jOstergard 2002| . 

Cliquer is based on an exact algorithm: in this respect we found it the more 
natural choice for comparison with participants in the ASP Competition, all of 
which are based on exhaustive search algorithms. A comparison with other existing 
approximate algorithms ( |Boppana and H alldorss on 1992| [Fcigc 2005]) would have 
been expectedly unbalanced in favor of these latter. Our findings show that, in the 
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setting of the competition, Potassco and Aclasp have comparable performance with 
Cliquer, while idp performed quite close to them. 

Crossing Minimization in layered graphs. Minimizing crossings in layered graphs is 
an important problem having relevant impact e.g., in the context of VLSI layout op- 
timization. The problem has been studied thoroughly, and valuable algorithms have 
been proposed for solving it, among which QJunger et al. 1997[|Healy and Kuusik 1999| 
IMutz el 2000) and ( |Gange et al. 2010] ). The instances considered for the Competi- 
tion were taken from the graphviz repository ( |Graphviz 201 

We ran the two ad-hoc solutions proposed in flGange et al. 2010] ) over the M&S 
Track instance family and compared results with outcomes of participants in the 
M&S competition. The two solutions were, respectively, based on a translation 
to SAT and to Mixed Integer Programming. The former was run using MiniSat+ 
(|Een and Sorensson 2006)) as solver, while the latter used CPlex 12.0 (jCPLEX 2011j) . 
Both solvers were run using default settings. 

The comparison shows a big gap between participants in the competition and 
the two yardsticks, which both perform much better. 

Reachability. This polynomial problem is a distinctive representative of problems 
that can be naturally specified using recursion in plain logic programming, lending 
itself to comparison with other logic programming-based systems. 

We compared the outcomes of participants in the M&S Track with XSB 3.2 
(XSB 2011), one of the reference systems of the OpenRulcBcnch initiative. Open- 
RuleBench (jFodor et al. 2011]) is aimed at comparing rule-based systems both from 
the Deductive Database area, Prolog-based and oriented to RDF triples. 

All the three participants submitting a solution to the Reachability problem 
outperformed XSB, especially in terms of time performance. However, it must be 
stated that the scoring system of the ASP Competition purposely does not exclude 
loading and indexing steps when measuring execution times, differently from the 
setting of OpenRuleBench; furthermore, XSB was run with its off-the-shelf config- 
uration, except for attribute indexing and tabling appropriately enabled. Indeed, 
BPsolver depends here on the same technology as XSB (tabling and top down), but 
fine-tuned to the problem. 

The Effects of Fine-tuning. Recall that the setting of the System Track pre- 
vented all participants from developing domain-dependent solutions and, on the 
contrary, the M&S Track allowed the submission of fine-tuned encodings and the 
static selection of systems parameters/heuristics. 

In order to assess the impact of fine-tuning, we selected the clasp version as yard- 
stick that participated in the System Track, labelled clasp (sysTrack) in Figure El 
then we ran it over P and NP problems which were in common between the System 
Track and the M&S Track problem suites@ clasp (sysTrack) was run using the fixed 

7 The HydraulicPlanning benchmark has been excluded since its specification was different in 
the M&S Track. 
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Fig. 3. Comparisons with State-of-the-art Solutions and other Yardsticks 



ASP- Core encodings and the settings of the System Track, but over the different 
(and larger) instance sets coming from the M&S Track. These choices ensured a 
comparison more targeted to the assessment of the impact of tuning: indeed, (i) the 
clasp (sysTrack) executable is the "naked'jfl solver of choice in almost all the solution 
bundles submitted by the Potassco team, the winner of the M&S Track; (ii) clasp 
(sysTrack) is the winner of the System Track in the P category, and runner-up in 
the NP category (in both it performed better than the overall winner claspD); (iii) 
there is no significant difference between the solutions presented by the Potassco 
team in both the System and the M&S Track for BeyondNP problems. 

The obtained results are reported in Figure [3l rightmost side, and, as expected, 
confirm the importance of fine-tuning and customized encodings. Indeed, "tuned" 
solution bundles outperformed the fixed configuration of clasp (sysTrack). This 
clearly indicates the need for further developing new optimization techniques and 
self-tuning methods (e.g., on the line of (|Gebser et al. 20lT|) ). and to further extend 
the basic standard language, in order to make efficient ASP solutions within reach 
of users that are not expert in the system's internals and/or specific features. 



7 Concluding remarks 

Much effort has been spent in the last 20 years by the ASP community, and out- 
standing results have been achieved since the first seminal papers; ASP and ASP 
system can be nowadays profitably exploited in many application settings, not only 
thanks to the declarative and expressive power of the formalism, but also thanks 
to continuously improving performances. Even a two-year horizon indicates that 
things are getting better and better. In addition, it is interesting to note that, 
despite the difference between the specifically tailored solutions and the one with 



In the sense that it does not feature any parameter tuning technique. 
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factory settings being significant, the current state-of-the-art ASP implementations 
are offering a good experience to application developers, given the nice declarative 
approach of the formalism and the mature, robust, and currently well-performing 
available systems. 

Nevertheless, there is still much room for improvements. For instance, the com- 
parison with some ad-hoc solvers confirmed that the performance is not a tout-court 
weak point anymore, but the gap with respect to some others suggests that they 
might be further improved. The main issue, however, still remains the lack of a 
sufficiently broad and standard language. 
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Appendix A ASP-Core Syntax and Semantics overview 

In the following, an overview of both the main constructs and the semantics of the 
ASP-Core language is reported. The full ASP-Core language specification can be 
found in (Calimer i et al. 2011a[) . 

A.l ASP-Core Syntax 

For the sake of readability, the language specification is hereafter given according to 
the traditional mathematical notation. A lexical matching table from the following 
notation to the actual raw input format prescribed for participants is provided in 
(|Calimeri et al. 201 la)) . 

Terms, constants, variables. Terms are either constants or variables. Constants can 
be either symbolic constants (strings starting with lower case letter) , strings (quoted 
sequences of characters), or integers. Variables are denoted as strings starting with 
an upper case letter. As a syntactic shortcut, the special variable "_" is a placeholder 
for a fresh variable name in the context at hand. 

Atoms and Literals. An atom is of the form p(X\, . . . , X n ), where p is a predicate 
name, X\ , . . . , X n are terms, and n (n > 0) is the fixed arity associated to 

M A 

classical literal is either of the form a (positive classical literal) or —a (negative 
classical literal), for a being an atom. A naf-literal is either a positive naf-literal a 
or a negative naf-literal not a, for a being a classical literal. 

Rules. An ASP-Core program P is a a finite set of rules. A rule r is of the form 

a\ V • • • V a„ «- bi, . . . , b k , not m, not n m . 

where n,k,m > 0, and at least one of n,k is greater than 0; oi, . . . , a n , b±, . . . , bk, 
and Hi, ... , n m are classical literals. a± V • • • V a n constitutes the head of r, whereas 
b\, . . . , bk, not m, not n m is the body of r. As usual, whenever k = m = 0, we 
omit the " sign, r is a fact if n = l,k = m = 0, while r is a constraint if n = 0. 

Rules written in ASP-Core are assumed to be safe. A rule r is safe if all its 
variables occur in at least one positive naf-literal in the body of r. A program P 
is safe if all its rules are safe. A program (a rule, a literal, an atom) is said to be 
ground (or propositional) if it contains no variables. 

Ground Queries. A program P can be coupled with a ground query in the form ql , 
where q is a ground naf-literal. 

A. 2 ASP-Core Semantics 

The semantics of ASP-Core, based on (jGelfond and Lifsc hitz 1991), exploits the 
traditional notion of the Herbrand interpretation. 

9 The atom referring to a predicate p of arity 0, can be stated either in the form p() or p. 
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Herbrand universe. Given a program P, the Herbrand universe of P, denoted by 
Up, is the set of all constants occurring in P. The Herbrand base of P, denoted by 
Bp, is the set of all ground naf- literals obtainable from the atoms of P by replacing 
variables with elements from U p . 

Ground programs and Interpretations. A ground instance of a rule r is obtained 
replacing each variable of r by an element from Up. Given a program P, we define 
the instantiation (grounding) grnd(P) of P as the set of all ground instances of its 
rules. Given a ground program P, an interpretation I for P is a subset of Bp. A 
consistent interpretation is such that {a, —a} % I for any ground atom a. In the 
following, we only deal with consistent interpretations. 

Satisfaction. A positive naf-literal I = a (resp., a naf-literal I = not a), for predi- 
cate atom a, is true with respect to an interpretation I if a £ I (resp., a £ I); it is 
false otherwise. Given a ground rule r, we say that r is satisfied with respect to an 
interpretation / if some atom appearing in the head of r is true with respect to / 
or some naf-literal appearing in the body of r is false with respect to /. 

Models. Given a ground program P, we say that a consistent interpretation / is a 
model of P iff all rules in grnd(P) are satisfied w.r.t. /. A model M is minimal if 
there is no model N for P such that N C M. 

Gelfond-Lifschitz reduct and Answer Sets. The Gelfond-Lifschitz reduct (Gelfond and Lifschitz 1991) 

of a program P with respect to an interpretation / is the positive ground program 

P 1 obtained from grnd(P) by: (i) deleting all rules with a negative naf-literal false 

w.r.t. /; (ii) deleting all negative naf-literals from the remaining rules. I C Bp is 

an answer set for P iff I is a minimal model for P 1 . The set of all answer sets for 

P is denoted by AS{P). 

Semantics of ground queries. Let P be a program and ql be a query, ql is true iff for 
all A <E AS(P) it holds that q 6 A. Basically, the semantics of queries corresponds 
to cautious reasoning, since a query is true if the corresponding atom is true in all 
answer sets of P. 
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The competition settings for the two tracks are depicted in Figure IB II The 
problems collected into the official problem suite (see | Appendix DJ ), were grouped 
into two different suites, one per each track. The problems belonging to the System 
Track suite were nearly a proper subset of the ones featured in the M&S Track. In 
both tracks, for each problem P, a number of instances Ip 1 , . . . , Ip N were selected^ 
For any problem P included into the System Track a corresponding fixed declarative 
specification, written in ASP- Core, Ep was also given. 

A team T participating in the System Track had to provide a unique executable 
system St- A team participating in the M&S Track, instead, had to produce a 
possibly-different execution bundle SystemBoXT,p for any problem P in the M&S 
Track suite. For each problem P, the participants were fed iteratively with all 
instances Ip i of P (in the case of the System Track each instance was fed together 
with the corresponding problem encoding Ep). 

The submitted executables were challenged to produce either a witness solution, 
denoted by Wf , or to report that no solution exists within a predefined amount 
of allowed time. The expected output format that is determined by the type of 
the problem (search, query, optimization) is reported in (Calime ri et al. 201 la|) . 
Participants were made aware, fairly in advance, of fixed encodings (in the case 
of the System Track), while they were provided only a small set of corresponding 
training instances. Official instances were kept secret until the actual start of the 
competition. Scores were awarded according to the competition scoring system (see 
Section [4] and |Appcndix C[ ) . 

Definition of "syntactic special purpose technique". The committee clas- 
sified as forbidden in the System Track: the switch of internal solver options de- 
pending either on command-line filenames, predicate and variable names, and "sig- 
nature" techniques aimed at recognizing a particular benchmark problem, such as 
counting the number of rules, constraints, predicates and atoms in a given encod- 
ing. In order to discourage the adoption of forbidden techniques, the organizing 
committee reserved the right to introduce syntactic means for scrambling program 
encodings, such as file, predicate and variable random renaming. Furthermore, the 
committee reserved the right to replace official program encodings arbitrarily with 
equivalent syntactically-changed versions. 

It is worth noting that, on the other hand, the semantic recognition of the pro- 
gram structure was allowed, and even encouraged. Allowed semantic recognition 
techniques explicitly included: (i) recognition of the class the problem encoding 
belongs to (e.g., stratified, positive, etc.), with possible consequent switch-on of on- 
purpose evaluation techniques; (ii) recognition of general rule and program struc- 
tures (e.g., common un-stratified even and odd-cycles, common join patterns within 
a rule body, etc.), provided that these techniques were general and not specific of 
a given problem selected for the competition. 



For problems appearing in both tracks, the instances selected for the M&S Track and those 
selected for the System Track were not necessarily the same. 
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Detailed Software and Hardware Settings. The Competition took place on a battery 
of four servers, featuring a 4-core Intel Xeon CPU X3430 running at 2.4 Ghz, with 
4 GiB of physical RAM and PAE enabled. 

The operating system of choice was Linux Debian Lenny (32bit), equipped with 
the C/C++ compiler GCC 4.3 and common scripting/development tools. Competi- 
tors were allowed to install their own compilers/libraries in local home directories, 
and to prepare system binaries for the specific Competition hardware settings. All 
the systems were benchmarked with just one out of four processors enabled, with 
the exception of the parallel solver clasp-mt that could exploit all the available 
core/processors. Each process spawned by a participant system had access to the 
usual Linux process memory space (slightly less than 3GiB user space + lGiB 
kernel space). The total memory allocated by all the child processes created was 
however constrained to a total of 3 GiB (1 GiB = 2 30 bytes). The memory foot- 
print of participant systems was controlled by using the Benchmark Tool Runl"1 
This tool is not able to detect short memory spikes (within 100 milliseconds) or, 
in some corner cases, memory overflow is detected with short delay: however, we 
pragmatically assumed the tool as the official reference. 

Detection of Incorrect Answers. Each benchmark domain P was equipped with a 
checker program Cp taking as input values a witness A and an instance /, and 
answering "true" in case A is a valid witness for / w.r.t problem P. The collection 
of checkers underwent thorough assessment and then was pragmatically assumed 
to be correct. 

Suppose that a system S is faulty for instance / of problem P; then, there were 
two possible scenarios in which incorrect answers needed detection and subsequent 
disqualification for a given system: 

• S produced an answer A, and A was not a correct solution (either because / 
was actually unsatisfiable or A was wrong at all). This scenario was detected 
by checking the output of Cp(A,I); 

• S answered that the instance was not satisfiable, but actually / had some 
witness. In this case, we checked whether a second system S' produced a 
solution A' for which Cp(A' , /) was true. 

Concerning optimization problems, checkers produced also the cost C of the given 
witness. This latter value was considered when computing scores and for assessing 
answers of systems. Note that cases of general failure (e.g. out of memory, other 
abrupt system failures) were not subject of disqualification on a given benchmark. 
As a last remark, note that in the setting of the System Track, where problem 
encodings were fixed, a single stability checker for answer sets could replace our 
collection of checkers. We preferred to exploit already available checker modules, 
which were also used for assessing the correctness of fixed official encodings set for 
the System Track. This enabled us to detect some early errors in fixed encodings: 



http : //f mv . jk u. at/run/ | 
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however, our lesson learned suggests that a general stability checker should be 
placed side-by-side to specific benchmark checkers. 

Other settings. The committee kept its neutral position and did not disclose any 
material submitted by participants until the end of the competition: however, par- 
ticipants were allowed to share their own work willingly at any moment. The above 
choice was taken in order to prefer scientific collaboration between teams over a 
strict competitive setting. All participants were asked to agree that any kind of 
submitted material (system binaries, scripts, problems encodings, etc.) was to be 
made public after the competition, so to guarantee transparency and reproducibil- 
ity. None of the members of the organizing committee submitted a system to the 
Competition, in order to play the role of neutral referee properly and guarantee an 
unbiased benchmark selection and rule definition process. 

Appendix C Detailed scoring regulations 
C. 1 Principles 

The main factors that were taken into account in the scoring framework are illus- 
trated next. 

1. Benchmarks with many instances should not dominate the overall score of a 
category. Thus, the overall score for a given problem P was normalized with 
respect to the number N of selected instances for P. 

2. Nonsound solvers and encodings were strongly discouraged. Thus, if system 
S produced an incorrect answer for an instance of a problem P then S is 
disqualified from P and the overall score achieved by S for problem P is 
invalidated (i.e., is set to zero). 

3. A system managing to solve a given problem instance sets a clear gap over all 
systems not able to do so. Thus, a flat reward for each instance / of a problem 
P was given to a system S that correctly solved I within the allotted time. 

4. Concerning time performance, human beings are generally more receptive to 
the logarithm of the changes of a value, rather than to the changes themselves; 
this is especially the case when considering evaluation times. Indeed, different 
systems with time performances being in the same order of magnitude are 
perceived as comparatively similar, in terms of both raw time performance 
and quality; furthermore, a system is generally perceived as clearly fast, when 
its solving times are orders of magnitude below the maximum allowed time. 
Keeping this in mind, and analogously to what has been done in SAT com- 
petitions!^] a logarithmically weighted bonus was awarded to faster systems 
depending on the time needed for solving each instance. 

5. In the case of optimization problems, scoring should depend also on the quality 
of the provided solution. Thus, bonus points were rewarded to systems able to 
find better solutions. Also, we wanted to take into account the fact that small 



See, for instance, the log based scoring formulas at http: / /www. satcompetition. org/2009/spec2009 .html 



The Third Open Answer Set Programming Competition 



25 



improvements in the quality of a solution are usually obtained at the price 
of much stronger computational efforts: thus the bonus for a better quality 
solution has been given on an exponential weighting basis. 



C.2 Scoring Rules 

The final score obtained by a system S in a track T consisted of the sum over the 
scores obtained by S in all benchmarks selected for T. In particular, a system could 
get a maximum of 100 points for each given benchmark problem P considered for 
T. The overall score of a system on a problem P counting N instances, hereafter 
denoted by S(P), was computed according to the following formulas that depend 
on whether P is a search, query or optimization problem. 

Wrong Answers. In the case where S produced an output detected as incorrect^ 
for at least one instance of P, then S was disqualified from P and S(P) was set to 
zero (i.e., S(P) — in case of incorrect output); otherwise, the following formulas 
were applied for computing S(P). 



Search and Query Problems. In case of both search and query problems the score 
S(P) was computed by the sum 

S(P) = S so l ve (P) + Stime(P) 

where S so i ve and Su me (P) take into account the number of instances solved by S 
in P and the corresponding running times, respectively; in particular 



Ssolve{P) = a— ; bume{P) = ~ T ^ \ 1 

i=l 



log(i* + 1) 



-v ' ' A" ^ V \iog(u + i) 

for Ns being the number of instances solved by P within the time limit, tout is the 



Incorrect answers were determined as specified in |Appcndix B| 
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Fig. C 1. Scoring Functions Exemplified (one instance, 100 pts max, t out — 600). 
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maximum allowed time, tj the time spent by S while solving instance i, and a a 
percentage factor balancing the impact of S so i ve (P) and Sume(P) on the overall 
score. Both S so i ve (P) and Stime(P) were rounded to the nearest integer. 

Note that S t i me (P) was specified in order to take into account the "perceived" 
performance of a system (as discussed in IC.ip . Figure IC U a) gives an intuitive 
idea about how Sume distributes a maximum score of 100 points considering a 
single instance and t out = 600. Note that half of the maximum score (50 points) is 
given to performance below 24 seconds about, and significant differences in scoring 
correspond to differences of orders of magnitude in time performance. 

Optimization Problems. As in the previous edition, the score of a system S in the 
case of optimization problems depends on whether S was able to find a solution or 
not, and in the former case, the score depends on the quality of the given solutions. 
In addition, as in the case of decision problems, time performance is taken into 
account. We assumed the cost function associated with optimization problems must 
be minimized (the lower, the better), and it had as its lowest bound. 

The overall score of a system for an optimization problem P is given by the sum 

S{P) = S opt (P) + S time (P) 

where Sume(P) is defined as for search problems, and S op t(P) takes into account 
the quality of the solution found. In particular, for each problem P, system S is 
rewarded of a number of points defined as 

N 

S opt (P)=a-J2si P t 

i=l 

where, as before, a is a percentage factor balancing the impact of S opt (P) and 
Stime(P) on the overall score, and S l opt is computed by properly summing, for each 
instance i of P, one or more of these rewards: 

1. jr points, if the system correctly recognizes an unsatisfiable instance; 

2. points, if the system produces a correct witness; 

3. points, if the system correctly recognizes an optimum solution and outputs 
it; 

4. •e M ~ ( ^ points, where Q denotes the quality of the solution produced by the 
system and M denotes the quality of the best answer produced by any system 
for the current instance, for M conventionally set to 100, and Q normalized 
accordingly. 

Taking into account that an incorrect answer causes the whole benchmark to pay 
no points, three scenarios may come out: timeout, unsatisfiable instance, or solution 
produced. Note thus that points of groups (1), (2) and (3-4-5) cannot be rewarded 
for the same instance. 

The intuitive impact of the above "quality" score S op t (P) can be seen in Figure 
IC lf b). in which the quality of a given solution, expressed in percentage distance 
from the optimal solution, is associated with the corresponding value of S op t (sup- 
pose a maximum of 100 points, a = 100, and one single instance per benchmark). 
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Note that a system producing a solution with a quality gap of 1% with respect 
to the best solution gets only 35 points (over 100) and the quality score quota 
rapidly decreases (it is basically for quality gap > 4%), so that small differences 
in the quality of a solution determine a strong difference in scoring according to 
considerations made in lC.ll 

In the present competition, for each problem domain we set t out = 600 seconds 
and a — 50; N has been set to 10 for the System Track, while it varied from problem 
to problem for the M&S Track, reaching up to 15 instances per single benchmark 
problem. 

Appendix D Benchmark Suite 

Benchmark problems were collected, selected and refined during the Call for Prob- 
lems stage. The whole procedure led to the selection of 35 problems, which consti- 
tuted the M&S Track problem suite. Taking into account what already discussed 
in Section [21 twenty problems out of the ones constituting the M&S Track were 
selected for composing the System Track suite: these had a natural and declarative 
ASP-Core encoding. Benchmark problems were classified according to their type 
into three categories: Search problems, requiring to find a solution (a witness) for 
the problem instance at hand, or to notify the non-existence of a solution; Query 
problems, consisting in checking whether a ground fact is contained in all the wit- 
nesses of the problem instance at hand (same as performing cautious reasoning on 
a given logic program); and, Optimization problems, i.e. a search problem in which 
a cost function associated to witnesses had to be minimized. The System Track did 
not contain optimization problems. 

Problems were further classified according to their computational complexity in 
three categories!^ Polynomial, NP and Beyond NP problems, these latter with 
the two subcategories composed of problems and optimization problems. In 
the following, we break down the benchmark suite according complexity categories 
and discuss some interesting aspects. The complete list of problems included in the 
competition benchmark suite, together with detailed problem descriptions, and full 
benchmark data, is available on the Competition web site (jCalimeri et al. 2010|) . 

Polynomial Problems. We classified in this category problems which are known 
to be solvable in polynomial time in the size of the input data (data complexity) . In 
the Competition suite such problems were usually characterized by the huge size of 
instance data and, thus, they were a natural test-bench for the impact of memory 
consumption on performance. It is worth disclaiming that the competition aim was 
not to compare ASP systems against technologies (database etc.) better tailored 
to solving this category of problems; nonetheless, several practical real- world ap- 
plications, which competitors should be able to cope with, fall into this category. 



The reader can refer to (Papadimitri ou 1994[ | for the definition of basic computational classes 
herein mentioned. 
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Note also that polynomial problems are usually entirely solved by participants' 
grounder modules, with little or no effort required by subsequent solving stages: in- 
deed, grounders are the technology that mainly underwent assessment while dealing 
with polynomial problems. There were seven polynomial problems included in the 
benchmark suite, six of which were selected for the System Track suite. 

Four of the above six were specified in a fragment of ASP-Core (i.e., stratified 
logic programs) with polynomial data complexity; a notable exception was made by 
the problems StableMarriage and PartnerUnitsPolynomial -which are also 
known to be solvable in polynomial time ( Gusfield and Irving 1989 Falk ner et al. 2 010) 



for which we chose their natural declarative encoding, making usage of disjunction. 
Note that, in the case of these last two problems, the "combined" ability of grounder 
and propositional solver modules was tested. The aim was to measure whether, and 
to what extent, a participant system could be able to converge on a polynomial 
evaluation strategy when fed with such a natural encoding. 

As further remark, note that the polynomial problem Reachability was ex- 
pressed in terms of a query problem, in which it was asked whether two given nodes 
were reachable in a given graph: this is a typical setting in which one can test sys- 
tems on their search space tailoring techniques (such as magic sets) (iBancilho n et al. 1 986 ). 
The polynomial problem participating to the M&S Track only was Company- 
Controls, given its natural modeling in term of a logic program with aggre- 
gates (jFaber et al. 2004)1 . these latter not included in the ASP-Core specifications. 

NP Problems. We classified in this category NP-complete problems or, more pre- 
cisely, their corresponding FNP versions. These problems constituted the "core" 
category, in which to test the attitude of a system in efficiently dealing with prob- 
lems expressed with the "Guess and Check" methodology (Leone et al. 2006). 

Among the selected NP problems there were ten puzzle problems, six of which 
inspired by or taken from planning domains; two classical graph problems; six, 
both temporal and spatial, resource allocation problems; and, three problems re- 
lated to applicative and academic settings, namely: Weight-AssignmentTree 
( Garcia-Molin a~et al. 2000)) which was concerned with the problem of finding the 
best join ordering in a conjunctive query; ReverseFolding which was aimed at 
mimicking the protein folding problem in a simplified setting (Dovicr 2011); and, 
MultiContextSystemQuerying, the unique problem considered in the System 
Track only, which was a query problem originating from reasoning tasks in Multi- 
Context Systems (|Dao-Tran et al. 2010)) . Notably, this latter problem had an ASP- 
Core encoding producing several logic submodules, each of which with independent 
answer sets. The ability to handle both cross-products of answer sets and early 
constraint firing efficiently were herein assessed. 

Beyond NP/E 2 P . The cate gory consisted of problems whose decision version was 
E^-complete. Since a significant fraction of current ASP systems cannot properly 
handle this class of problems, only two benchmarks were selected, namely Strate- 
GICCompanies and MinimalDiagnosis. The former is a traditional problem 
coming from (Cad oli et al. 199 7). while the latter originates from an application in 
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molecular biology (IGebser et al. 20lTj) . As far as the System Track is concerned, 
Sf problems have an ASP encoding making unrestricted usage of disjunction in 
rule heads. 

Beyond NP /Optimization. These are all the problems with an explicit formu- 
lation given in terms of a cost function with respect to each witness has to be 
minimized. The above categorization does not imply a given problem stays outside 
, although this has been generally the case for this edition of the competition. 
The selected problems were of heterogenous provenance, including classic graph 
problems and sequential optimization planning problems. No benchmark from this 
category was present in the System Track benchmark suite. 
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Appendix E System Versions 



As described in Section [3j the participants submitted original systems and solution 
bundles possibly relying on different (sub)systems. 

In some cases, systems were provided as custom versions compiled on purpose 
for the Competition; in some other cases, the executables came from the official 
release sources, but have been built on the competition machines, and hence might 
differ from the ones officially distributed. We explicitly report here the exact ver- 
sions, whenever applicable, if explicitly stated by the participants; it is worth re- 
membering that, for the sake of reproducibility, all systems and solution bundles, 
together with encodings, instances, scripts, and everything else needed for the ac- 
tual re-execution of the competition, is available on on the competition web site 
(jCalimeri et al. 2010[) . where more details on systems and teams can be found, as 
well. 



System 

• clasp (v 2.0.0-RC2) 

• claspD (v 1.1.1) 

• claspf olio (v 1.0.0) 

• idp (custom) 

• CMODELS (v 3.81) 

• SUP (v 0.4) 

• lp2gminisat, 

lp2lminisat, 
lp2minisat 

• lp2diffz3 

• Smodels (v. 2.34) 



Related Systems/Subsystems 

• Gringo (v. 3. 0.3) 

• Gringo (v. 3. 0.3) 

• Gringo (v. 3. 0.3) 

• Gringo (v. 3. 0.3), 

• Gringo (v. 3. 0.3), 

• Gringo (v. 3. 0.3) 

• Gringo (v. 3.0.3) 



MiniSatID (v. 2.5.0) 
MiniSat v 2.0-beta 



Smodels (v. 2.34), lpcat (v. 1.18), 
lp2normal (v. 1.11), igen (v. 1.7), Ip21p2 (v. 1.17), 
lp2sat (v 1.15), Minisat (v. 1.14), interpret (v. 1.7) 
Gringo (v. 3.0.3), Smodels (v. 2.34), lpcat (v. 1.18), 

12diff (v. 1.27), z3 (v. 2.11), interpret (v. 1.7) 
Gringo (v.3.0.3) 



Team 

• Aclasp 

• BPSolver 

• EZCSP 

• Fast Downward 

• IDP 

• Potassco 



System/Subsistems exploited 

• clasp (custom), Gringo (v. 3. 0.4) 

• B-Prolog (v. 7.1), 

• ezcsp (v. 1.6.20b26), iClingo (v. 3.0.3), clasp, 

ASPM, B-Prolog, MKAtoms (v. 2.10), Gringo (v. 3.0.3) 

• Fast Downward (custom) 

• Gidl v. 1.6.12, MiniSatID (v. 2.5.0) 

• clasp (v 2.0.0-RC2), claspD (v 1.1.1), Gringo (v.3.0.3), 

Clingcon (v. 0.1.2) 



Appendix F Detailed result tables for Section [5] 

We report here detailed figures of the competition. 

All the graphs plot a number representing a number of instances (horizontal axis) 
against the time (expressed in seconds) needed by each solution bundle to solve them 
(vertical axis): the slower a line grows, the more efficient the corresponding solution 
bundle performed. Note that not all the participants solved the same number of 
instances within the maximum allotted time. 

In figure lF3l and lF~4l participants are ordered by Final score (i.e., ^ P S(P)); 
for each participant, three rows report for each problem P: (i) the Score, (ii) the 
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Instance quota (i.e., ^ p S so i ve (P) or ^ p S op t(P) for optimization problems), (Hi) 
the Time quota (i.e., J2p S tlm e(P))- 

For each category, the best performance among official participants is reported 
in bold face. In all tables, an asterisk ('*') indicates that the system/team has been 
disqualified for the corresponding benchmark problem. 

Solved Instances 




30 60 



Fig. Fl. System Track: Overall Results [Exec, time (y-axis), Solved Instances (x- 
axis)] 




Fig. F2. M&S Track: Overall Results [Exec, time (y-axis), Solved Instances (x- 
axis)] 
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Fig. F3. System Track - Overall Results 
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Fig. F5. Results in Detail: Execution time (y-axis), Solved Instances (x-axis 
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(a) System Track NP 




Fig. F6. Results in Detail: Execution time (y-axis), Solved Instances (x-axis). 
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Fig. F7. Results in Detail: Execution time (y-axis), Solved Instances (x-axis). 
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(b) Team Track Optimization 

Fig. F8. Results in Detail: Execution time (y-axis), Solved Instances (x-axis). 



